-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XXX feat(controller): permit setting GUNICORN_WORKERS #75
Conversation
did I tag this correctly? it's a charts-only change... |
... it's not charts-only anymore |
This looks good. Have you tested it by spinning up a controller with your changes connected to the database? Looks good but we should test it just in case the charts broke. You should be able to package the controller charts that have changed. Define a values.yml and pass in the
|
I will give it a shot and report back. Thanks for the guidance! |
Hmm, that might have worked in a vacuum, but I tried upgrading my existing deployment and I ran into a number of issues. For one thing, slugrunner-config appears to be a dependency of controller (did not know that) Also noticed that doing a deploy this way without going through deisrel, directly from a branch of the git repo, still reaches out to I have a feeling that the patch is fine, but I have a homework assignment now and will take some time to verify it. Meanwhile I noticed this one thing missing that should really be added... ade2319 |
@kingdonb It should be best done to deploy Hephy Workflow 2.19.4 on minikube and then do the helm commands above. You will also need to build a new Docker image and push it somewhere and update the charts to pull that custom image that contains the change. Also the build is failing and I restarted it. Let's see... |
Hey @kingdonb , do you think you will have some time to take a look at this again? |
Yes, I would like to... but I don't know if I will be able to get to it before 11 days from now, when the Open Roadmap is scheduled. I don't know if someone else can test it? I'm setting a reminder to check on this in 4 days, hopefully when that comes I'll be able to spend the time and test it. I just can't right now. |
I can also try to test PR at some point if I can get to it, but I will leave it to you for now as there are some other things that are more pending. |
I'm going to get around to this, let's say by Wednesday If I don't, then you can rib on me Thursday at the Open Roadmap meeting. I also said I would look at teamhephy/workflow-e2e#18 |
Whenever you have some free time to devote to this @kingdonb . I know you have been busy. 🐝 EDIT: I am assuming this has to mostly just be tested and as long as it doesn't break any existing functionality it should be good to merge in. |
This time it passed the tests. I still haven't manually tried all four of the examples I wrote just to be sure... maybe tomorrow.
|
# Setting the value to "None" will ensure that connections are never timed out. | ||
# - 0 | ||
# - 600 | ||
# - "None" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the default value of conn_max_age
when no value is set in the chart? Sounds like it will be "None" by default... Is that the correct behaviour we want?
OK, let me know when it has been tested sufficiently and can be merged. |
Judging from the original intention of Docker, it does not recommend running multiple processes in a single container. So I suggest using Scale instead of multi process. |
I think in the original issue, the problem was that the database connections were not being closed when running connections exceeded the maximum value of GUNICORN_WORKERS. The controller is the main client of the database and we should be able to fine-grain control how many worker processes it can create on systems with lots of CPUs:
Yes, scaling containers is better than scaling the processes generally with container technologies. But here in this example we are not totally in charge of how Django uses Gunicorn, and so here we are dealing with the Gunicorn Server Architecture Model. I think here we are correct in how we are fine-tuning the worker threads with these two variables. Gunicorn Server Model. This is very similar to how other process managers work - Resque-pool, Unicorn, Puma, PM2, and nginx to name a few. This concurrent multi-threading of worker processes is actually a benefit inside containers as it saves memory. If we have two threads and one master process, one thread sleeps while one thread processes a request. The general benefit is that both threads will share memory and resources passed down from the master thread. In the very least, they will share the dynamic libraries that they use as well as the memory that they reserve to process requests. |
Agree with you. Python has globally interpreted locks, so the threading model is not a good idea; gevent / event is a good solution in high IO situations. I wanted to use Gunicorn + gevent to run the controller, but then I thought it might not be necessary because the controller should not be a performance bottleneck. |
Yes, exactly. Python GIL is part of the interpreter and there is no way around it. Controller is still IO bound and thus CPU is still not the deciding factor in how many requests can be handled without being blocked. If users with many CPU machines want to run the controller they should be able to scale the gunicorn workers and fine-tune them as this would normally be useful for them for maximizing concurrency and parallelism of these threads. Good article on how the different types of workers behave in gunicorn. |
@@ -24,8 +24,6 @@ | |||
'security.W008' | |||
] | |||
|
|||
CONN_MAX_AGE = 60 * 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be added back. I think this is a different CONN_MAX_AGE
env variable?
Hi @kingdonb , have you tested this? What about setting the |
and CONN_MAX_AGE from global values.yaml
The controller values.yaml should mention all of the settings that can affect the controller in values.yaml too.
a965de8
to
4f93b5d
Compare
Building this image, rebased on the latest controller, gave me a headache (CrashLoopBackOff)
I have no idea what caused this, it doesn't look like it's related to my change, but I submitted a separate PR with chart-only change that gets it done. We already had the gunicorn_workers code merged into controller python code. It only needed to be set as an environment variable from the chart/deployment spec. The loose end in here is the CONN_MAX_AGE thing. I'll keep it open until we figure out if there's a change needed for that. |
Seem to be failing at Do you have some type of user and environment string with a special character. We must be breaking somewhere with the connection string to the db? |
teamhephy#75 (review) > I think this is a different CONN_MAX_AGE env variable? I don't see where else this variable is used, but no other change is present which could have caused the tests to fail ?
Close in favor of #139 |
chore(oauth2): update user info pipline
permit setting GUNICORN_WORKERS and CONN_MAX_AGE from global values.yaml
(an additional pull request to the umbrella 'workflow' chart will follow)